AMENDMENTS TO THE CLAIMS : 
Claims 1-52 (Cancelled) 

53. (Currently amended) A method for a data processing system to efficiently identify 
at least one data set dataset from a collection of datasets according to a query containing 
information indicative of desired datasets, wherein each dataset includes one or more data points 
and each data point corresponds to at least one of a word, a phrase, a sentence, a color, a 
typography, a punctuation, a picture, and a character string, t he method comprising the machine- 
executed steps: 

constructing a semantic vector for each dataset; 

receiving the query containing information indicative of desired datasets; 
constructing a semantic vector for the query; 

comparing the semantic vector for the query to the semantic vector of each dataset; 
selecting datasets whose semantic vectors are closest in distance to the semantic vector for 
the query; and 

outputting information of the selected datasets to be corresponding to the desired datasets 
identified in the query; g enerating a r e sult including information of the sel e ct e d datas e ts according 
to a result of the selecting st e p; 

wherein: 

the query or each of the datasets includes at least one data point; and 

the semantic vector for the query or each of the datasets is constructed by the steps of: 



for each data point, constructing a tabl e for storing information indicativo o fi dentifying a 
relationship between each data point and predetermined categories corresponding to dimensions 
in the semantic space; 

determining the significance of each data point with respect to the predetermined 
categori e s x ategories, wherein the significance represents a relative strength of each data point 
relative to each of the predetermined particular categories, or a degree of relevance of each data 
point relative each of the predetermined particular categories; 

constructing a semantic vector for each data point, wherein each semantic vector has 
dimensions equal to the number of predetermined categories and represents the relative strength 
of its corresponding data point with respect to each of the predetermined categories; and 

combining based on t he semantic vector for each of the at least one data point point to 
form the semantic vector of the query or each of the datasets. 

54. (Original) The method of Claim 53 , wherein the datasets correspond to 
documents and the query is a natural language query. 

55. (Previously Presented) The method of Claim 53, further comprising the steps: 
performing a second search for datasets within the collection of datasets, wherein the 

second search using a method other than semantic vectors; 

combining the two search results to obtain a combined weighted score for each dataset in 
either of the two search results; 

selecting datasets whose combined weighted score is largest. 

| 

56. (Original) The method of Claim 53, further comprising a step of clustering the 

j 

selected datasets in real time. 
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57. (Currently amended) A method for efficiently identifying data points in a semantic 
lexicon related to a dataset, wherein the dataset includes one or more data points and each data 
point corresponds to at least one of a word, a phrase, a sentence, a typography, a punctuation, 
and a character string, the method comprising the machine-executed steps: 

constructing a semantic vector for the dataset; 

comparing the semantic vector for the dataset to a^semantic vector of each of the data 
points in the semantic lexicon; 

selecting data points whose semantic vectors are closest in distance to the semantic vector 
for the dataset; and 

addiag -associating said selected data points to said dataset; 

wherein: 

th e dataset includes at l e ast on e data point; and 

the semantic vector for the dataset is constructed by the steps of: 

for each data point, constructing a table for storing information indicative of identifying a 
relationship between each data point and predetermined categories corresponding to dimensions 
in the semantic space; 

determining the significance of each data point with respect to the predetermined 
categories, wherein the significance represents a relative strength of each data point relative to 
each of the predetermined particular categories, or a degree of relevance of each data point 
relative each of the predetermined particular categories categories: 

constructing a semantic vector for each data point, wherein each semantic vector has 
dimensions equal to the number of predetermined categories and represents the relative strength 
of its corresponding data point with respect to each of the predetermined categories; and 
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combining based on t he semantic vector for each of the at least one data point to point 
form the semantic vector of the dataset. 

58. (Original) The method of Claim 57, wherein the dataset is a document and the 
data points are words. 

59. (Original) The method of Claim 57, wherein the dataset is a natural language 
query in a search system and the data points are words. 

Claims 60-64 (Cancelled) 

65. (Currently amended) A system for identifying at least one data set from a 
collection of datasets according to a query containing information indicative of desired datasets, 
wherein each dataset includes one or more data points and each data point corresponds to at least 
one of a word, a phrase, a sentence, a color, a typography, a punctuation, a picture, and a 
character string, the system comprising: 

a computer configured to: 

construct a semantic vector for each dataset; 

receive the query containing information indicative of desired datasets; 
construct a semantic vector for the query; 

compare the semantic vector for the query to the semantic vector of each dataset; 

select datasets whose semantic vectors are closest in distance to the semantic 
vector for the query; and 

generat e a result including information of the selected datasets according to a 
r e sult of th e selecting stcp output information of the selected datasets to be corresponding to the 
desired datasets identified in the query : 
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wherein: 



the query or each of the datasets includes at least one data point; and 
the semantic vector for the query or each of the datasets is constructed by the machine- 
executed steps of: 

for each data point, constructing a table for storing information indicative o fi dentifying a 
relationship between each data point and predetermined categories corresponding to dimensions 
in the semantic space; 

determining the significance of each data point with respect to the predetermined 
categories, wherein the significance represents a relative strength of each data point relative to 
each of the predetermined particular categories, or a degree of relevance of each data point 
relative each of the predetermined particular categories categorics: 

constructing a semantic vector for each data point, wherein each semantic vector has 
dimensions equal to the number of predetermined categories and represents the relative strength 
of its corresponding data point with respect to each of the predetermined categories; and 



form the semantic vector of the query or each of the datasets. 
Claims 66-70 (Cancelled) 

71. (Currently amended) A tangible computer-readable medium carrying one or more 
sequences of instructions for efficiently identifying at least one data set from a collection of 



datasets according to an inquiry- query containing information indicative of desired datasets, each 
dataset including one or more data points and each data point corresponding to at least one of a 
word, a phrase, a sentence, a color, a typography, a punctuation, a picture, and a character string. 




-based on the semantic vector for each of the at least one data point to point. 
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wherein execution of the one or more sequences of instructions by one or more processors causes 
the one or more processors to perform the steps of: 

constructing a semantic vector for each dataset; 

receiving the query containing information indicative of desired datasets; 
constructing a semantic vector for the query; 

comparing the semantic vector for the query to the semantic vector of each dataset; 
selecting datasets whose semantic vectors are closest in distance to the semantic vector for 
the query; and 

g e nerating a result including information of the s e l e ct e d datasets according to a result of 
th e s e l e cting stcp; outputting information of the selected datasets to be corresponding to the 
desired datasets identified in the query; 

wherein: 

the query or each of the datasets includes at least one data point; and 

the semantic vector for the query or each of the datasets is constructed by the steps of: 

for each data point, constructing a table for storing information indicative o fi dentifying a 

relationship between each data point and predetermined categories corresponding to dimensions 

in the semantic space; 

determining the significance of each data point with respect to the predetermined 
categories, wherein the significance represents a relative strength of each data point relative to 
each of the predetermined particular categories, or a degree of relevance of each data point 
relative each of the predetermined particular categories catcgorics ; 
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constructing a semantic vector for each data point, wherein each semantic vector has 
dimensions equal to the number of predetermined categories and represents the relative strength 
of its corresponding data point with respect to each of the predetermined categories; and 

combining based on the semantic vector for each of the at least one data point to point 
form the semantic vector of the query or each of the datasets. 

Claims 72-75 (Cancelled) 
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